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Abstract. Consider an asynchronous network in a shared-memory environment 
consisting of n nodes. Assume that up to / of the nodes might be Byzantine 
(n > 12/), where the adversary is full-information and dynamic (sometimes called 
adaptive). In addition, the non- Byzantine nodes may undergo transient failures. 
Nodes advance in atomic steps, which consist of reading all registers, performing 
some calculation and writing to all registers. 
The three main contributions of the paper are: first, the clock-function problem is 
CN| , defined, which is a generalization of the clock synchronization problem. This gen- 

eralization encapsulates previous clock synchronization problem definitions while 
3 | extending them to the current paper's model. Second, a randomized asynchronous 

self-stabilizing Byzantine tolerant clock synchronization algorithm is presented. 
In the construction of the clock synchronization algorithm, a building block that 
ensures different nodes advance at similar rates is developed. This feature is the 
third contribution of the paper. It is self-stabilizing and Byzantine tolerant and 
can be used as a building block for different algorithms that operate in an asyn- 
chronous self-stabilizing Byzantine model. 
(_J ■ The convergence time of the presented algorithm is exponential. Observe that 

jyj ' in the asynchronous setting the best known full-information dynamic Byzantine 

O ■ agreement also has an expected exponential convergence time. 
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1 Introduction 



> 

ON , 

(^> When tackling problems in distributed systems, there are many previously devel- 

C~- oped building blocks that assist in solving the problem. Some of these building 

blocks allow one to design a solution under "easy" assumptions, then automati- 

L~ ', cally transform them to a more realistic environment. For example, it is easier to 

^^ | construct an algorithm in the synchronous model, then add an underlying syn- 

chronizer (see [4]) to adapt the solution to an asynchronous model. Similarly, de- 
veloping a self-stabilizing algorithm can be challenging; instead, one can develop 

k>( \ a non-self-stabilizing algorithm, and use a stabilizer ([1]) to address transient 

j^ ■ errors. 

Among the different models of distributed systems, specific models received 
more attention than others; and therefore the availability and versatility of build- 
ing blocks differ from one model to another. For example, the synchronous no- 
failures model can automatically be extended in many different directions: asyn- 
chronous no-failures, synchronous self-stabilizing, asynchronous self-stabilizing, 
etc. Zooming-in to the world of self-stabilizing, there are various model-convertors: 
between shared-memory and message-passing, from an id-based to uniform sys- 
tem, etc. (see [11]). 



However, when moving away from the commonly researched models, the avail- 
ability of such model-converters diminishes. In the current paper we are interested 
in an asynchronous network with Byzantine nodes and transient failures. That is, 
we aim at solving a problem in a way that is Byzantine tolerant, self-stabilizing 
and operates in an asynchronous network. The Byzantine adversary is assumed to 
be full-information and dynamic (sometimes called adaptive). There are few pre- 
vious works that operate in similar models [18,19,17]. In these works non-faulty 
neighbors of Byzantine nodes may reach undesired states. However, as far as we 
know, this is the first work operating in such a setting in which non-Byzantine 
nodes reach their desired state even if they have Byzantine neighbors. 

The problem we solve in the current work is clock-synchronization. Our solu- 
tion assumes two simplifying assumptions: a) a "centralized daemon", i.e., each 
node can run the entire algorithm as an atomic step; b) an "en masse scheduler" 
that adheres to the following: if p gets scheduled twice, then n — If other non- 
faulty nodes get scheduled in between (formally defined in Definition 2). Under 
these assumptions we define and solve the clock synchronization problem in an 
asynchronous network while tolerating both Byzantine and transient faults. The 
solution is a randomized algorithm with expected convergence time of 0(3 n ~ 2 f). 

Both assumptions can be seen as "building blocks that do not yet exist". 
When constructing a self-stabilizing asynchronous algorithm (without Byzantine 
nodes) , it is reasonable to assume a centralized daemon due to the mutual exclu- 
sion algorithm of Dijkstra (see [8]). Thus, once an equivalent algorithm can be 
devised for this paper's model, the first assumption can be removed. In Section 6 
we provide an algorithm that implements the second assumption, thus allowing 
its usage without reducing the generality of a solution that uses it. 

Due to our dependence on the first assumption, we consider this work as 
a step towards a full solution of the clock synchronization in an asynchronous 
network that is self-stabilizing and Byzantine tolerant. We hope it leads to fur- 
ther research of this model, one which will produce an equivalent of Dijkstra's 
algorithm operating in the current work's model. 

Related Work Being able to introduce consistent "time" in a distributed system 
is an important task, however difficult it may be in some models. For many 
distributed tasks the crux of the problem is to synchronize the operations of the 
different nodes. One method of doing this is using some sort of "time-awareness" 
at each node, ensuring that different "clocks" advance in a relatively synchronized 
manner. Therefore, it is interesting to devise such algorithms that are highly 
robust. 

In the past, various models were considered. Ranging from synchronous sys- 
tems (see [10,12,15]), in which all nodes receive a common signal simultaneously 
at regular intervals; through bounded-delay systems (see [9,?]), in which only a 
bound on the message delivery time is given; to completely asynchronous systems 
(see [7,14]), in which only an eventual (but not bounded) delivery of messages 



is assumed. Independent of the timing model, different fault tolerance assump- 
tions are considered: the self-stabilizing fault paradigm, in which all nodes follow 
their protocol but may start with arbitrary values of their variables and program 
counter (see [11]). Another commonly assumed faults are the Fail-stop faults, 
in which some of the nodes may crash and cease to participate in the protocol. 
Lastly, Byzantine faults are considered to be the most severe fault model, as they 
assume that the faulty nodes can behave arbitrarily and even collude in trying 
to keep the system from reaching its designated goals (see [3,16]). 

"Knowing what time it is" acquires different flavors in different models. In 
systems without any faults, it is usually assumed that each node has a physical 
clock, and these clocks differ from node to node. The main issue is to synchro- 
nize the different clocks as close as possible. In a synchronous, self-stabilizing 
and Byzantine tolerant model, this problem was termed "digital clock synchro- 
nization" , and consisted of having all nodes agree on some bounded integer and 
increase it every round (see [2,10,12,15]). 

The traditional concept of "clock synchronization" does not hold in an asyn- 
chronous environment. Therefore, previous work has defined "phase clocks" or 
"unison" (see [7,14]), which states that each node has an integer valued clock, 
and neighboring nodes should be at most ±1 from each other. It is shown (for 
example, see [14]) how such "synchronization" is sufficient in solving many prob- 
lems. 

Most previous works in the asynchronous model considered self-stabilizing 
or Byzantine faults, but not both. In the current work, we consider both fault 
models. However, defining what "telling the time" means in an asynchronous, 
self-stabilizing and Byzantine tolerant manner is a bit tricky. To address that, a 
new notion of "knowing what time it is" is introduced: a clock function. 

All previous clock-synchronization (or phase-clock, or unison, etc.) algorithms 
can be viewed in the following way: each time a non-faulty node is running, it 
executes some piece of code ( "function" ) that returns a value ( "the clock value" ) 
and there are constraints on the range of different non- faulty nodes' values. In 
the synchronous digital clock synchronization problem, the function returns an 
integer value, and we require that all nodes executing the function at the same 
round receive exactly the same value and a node executing the function in con- 
secutive rounds receives consecutive values. In an asynchronous network (i.e., in 
[14]), different nodes may execute their clock- functions at different times and at 
different rates. The constraint on the returned values can be described as follows: 
given a configuration of the system, if p would execute its clock-function and 
receive a value v, then any neighbor of p that would execute its clock-function at 
the same configuration, would receive a value that differs by at most ±1 from v. 

In the current work it is assumed that the network is fully connected, which 
means that every node is connected to every other node. Therefore, the constraint 
of the clock-function is simplified, informally requiring any two non-faulty nodes 
that execute the clock- function to receive values that are at most one apart. 



In synchronous networks the problem of self-stabilizing Byzantine tolerant 
clock synchronization is equivalent to the problem of Byzantine agreement, in the 
sense that any solution to the self-stabilizing Byzantine tolerant clock synchro- 
nization problem is also a solution to the (non-self-stabilizing) Byzantine agree- 
ment problem. In asynchronous networks the best known full-information dy- 
namic Byzantine agreement has expected exponential convergence time (see [5]). 
While the synchronous equivalence between clock synchronization and Byzantine 
agreement does not transfer to the asynchronous setting (as it strongly uses the 
fact that all nodes agree on the exact same clock value), it raises the possibil- 
ity that improving the result of this paper will require usage of new techniques. 
That is, it is not known yet if the self-stabilizing clock synchronization of the 
current work can be used to solve Byzantine agreement. However, if it can be 
used, then improving the exponential convergence of the current work would lead 
to an improvement of the best known asynchronous Byzantine agreement against 
a dynamic full-information adversary. 

Contribution Our contribution is three-fold. First, we define the clock-function 
problem, which is a generalization of the clock synchronization problem. This 
definition provides a meaningful extension of the clock synchronization problem 
to the asynchronous self-stabilizing Byzantine tolerant model. 

Second, we provide an algorithm that solves the clock-function problem in 
the above model. Using shared memory, it has an expected 0{2> n ~ 2 ') conver- 
gence time, independent of the wraparound value of the clock. Notice that for 
synchronous networks, the first two contributions were already presented in [12]. 
Our contribution is with respect to asynchronous networks. 

Lastly, in Section 6 we construct a building block that bounds the relative 
rates at which different non-faulty nodes progress with respect to other non-faulty 
nodes. More specifically, between any two atomic steps of a non-faulty node p, 
there are guaranteed to be atomic steps of n — 2/ other non-faulty nodes. (See 
the "en masse scheduler" assumption described in the introduction). We postu- 
late that this building block can be used in other asynchronous self-stabilizing 
Byzantine tolerant settings. 

Overview We start by defining the model (see Section 2). A subset of all pos- 
sible runs is defined and denoted "en masse" (see Definition 2). Section 3 dis- 
cusses different aspects of defining clock synchronization in an asynchronous, self- 
stabilizing, Byzantine tolerant environment; and defines a clock function, which 
is a generalization of the clock synchronization problem. 

Section 4 introduces As ync- Clock, an algorithm that solves the problem at 
hand. Section 5 contains the correctness proof for Async-Clock. Both Section 4 
and Section 5 are correct only for en masse runs, for which As ync- Clock re- 
quires fault redundancy of n > 6/. 



In Section 6 the algorithm EnMasse is presented, which transforms any run 
into an en masse run. Leading to the correctness of Async-Clock for any run. 
However, the transformation done by EnMasse increases the fault redundancy 
of Async-Clock to n > 12/. Lastly, Section 7 concludes with a discussion of 
the results. 



2 Distributed Model 

The system is composed of a set of n nodes denoted by V. Every pair of nodes 
p,q € V communicates via shared memory (i.e., a fully-connected communica- 
tion graph), in an asynchronous manner. That is, p and q share two registers: 
Rp^jRq^- 1 Register R p ^ q is written by p and read by q? A configuration C de- 
scribes the global state of the system and consists of the states of each node 
and the state of each register. A run of the system is an infinite sequence of 
configurations Cq — > C\ —>■•••—> C r —)•••• , such that the configuration C r+ \ is 
reachable from configuration C r by a single node's atomic step. In the context 
of the current paper, an atomic step consists of reading all registers, performing 
some calculation and then writing to all registers. 

The system is assumed to start from an arbitrary initial configuration Co- We 
show that eventually - in the presence of continuous Byzantine behavior - the 
system becomes synchronized. 

In addition to transient faults, up to / of the nodes may be Byzantine. The 
Byzantine adversary has full information, i.e., it can read the values in every 
node's memory 3 and in the shared registers between any two nodes. There are 
no private channels and the adversary is computationally unbounded. Moreover, 
the adversary is dynamic, which means it may choose to "capture" a non-faulty 
node at any stage of the algorithm. However, once the adversary has "captured" 
/ nodes in some run, it cannot affect other nodes and in a sense becomes static. 
The results of this paper can be extended to the setting in which the adversary 
continues to be dynamic throughout the run, as long as the adversary is limited 
by the rate at which it can release and capture non-faulty nodes. We do not 
present this extension in the current paper for the sake of clarity. However, one 
can easily be convinced that it applies, once the main points of the work are 
explained. 

The adversary also has full control of the scheduling of atomic steps and 
can use its full information knowledge in this scheduling. However, for a clock 
synchronization algorithm to be meaningful, runs in which some of the non- faulty 



1 Pair-wise communication is used to allow Byzantine nodes to present different values to 
different nodes; as opposed to assuming a single register per-node that can be read by all 
other nodes. 

2 For simpler presentation we assume that p writes and reads R P , P - 

3 Actually, the presented algorithm stores all its state in the shared registers. 



nodes never get to perform atomic steps should be excluded. Thus, throughout 
the paper only fair runs are considered: 

Definition 1. A run is fair if every non-faulty node performs infinitely many 
atomic steps. 

A subset of all fair runs is defined: 

Definition 2. A run T is en masse 'with respect to node p if for any 2 atomic 
steps p performs during T (say at configurations C and C , respectively) there are 
at least n — 2/ non-faulty nodes that perform atomic steps between C and C . 

A run T is en masse if it is fair and it is en masse with respect to all 
non-faulty nodes. 

As stated in the overview, en masse runs are needed for Async-Clock to operate 
correctly. Assuming all runs are en masse runs, the fault tolerance redundancy 
required is n > 6/ (see Lemma 3 for an example of the necessity of n > 6/). 
However, in Section 6 we show how to remove the requirement of en masse runs, 
at the cost of increasing the fault tolerance redundancy to n > 12/. 

3 Problem Definition 

Before formally defining the problem at hand, consider the properties a dis- 
tributed clock synchronization algorithm should have in an asynchronous setting: 

1. (clock-value) a means of locally computing the current clock value at any 
non-faulty node; 

2. (agreement) if different non- faulty nodes compute the clock value close (in 
time) to each other, they should obtain similar values; 

3. (liveness) if non-faulty nodes continuously recompute clock values, then they 
should obtain increasing values. 

For example, in a synchronous network, the clock synchronization problem is 
usually formulated as: (clock-value) each node p has a bounded integer counter 
Clockp] (agreement) for any two non- faulty nodes p, q it holds that Clock p = 
Clock q ; (liveness) if Clock p = z at round r then Clock p = z + 1 at round r + 1. 
Since the clock is bounded, the previous sentence is slightly modified: "if Clockp = 
z at round r, then Clock p = z + l(mod k) at round r + 1"; where k represents 
the wrap-around value of the clock. 

Notation 31 Denote by a ©^ 6 the value (a + b mod k). 

In an asynchronous setting, it is impossible to ensure that all nodes update 
their clocks simultaneously. Thus, the "agreement" property requires a relaxed 
version as opposed to the synchronous setting's stricter version. In addition, the 
"liveness" property is somewhat tricky to define, due to the Byzantine presence. 



To illustrate the difficulty, consider a set of / Byzantine nodes that "behave as if" 
they were non-faulty, and they repeatedly recompute the clock value. According 
to the definition above, the clock value will increase continuously, even though 
non-faulty nodes did not perform a single step. Therefore, such a clock synchro- 
nization algorithm is useless, as the Byzantine nodes can make it reach any clock 
value; in other words, the Byzantine nodes "control" the clock value. 

It is not immediately clear how these "benign" Byzantine nodes can be dif- 
ferentiated from the non-faulty nodes. The following definitions address such 
difficulties, and present a formalization of the clock synchronization problem in 
this paper's model. 

Definition 3. A value v' is at most al ahead of v if there exists j, < j < d, 
such that f ©fe J = v'. Denote "v' is at most d ahead of v" by v <d v' . 

Definition 4 addresses the "clock-value" property: 

Definition 4. A clock-function T is an algorithm that when executed during 
an atomic step returns a value in the range {0, ...,k — 1}. Denote by TpiC) the 
value returned when p executes J- during an atomic step at configuration C . 

Consider the "agreement" property: it requires that different non-faulty nodes 
that compute the clock value simultaneously, receive similar values. What does 
"simultaneously" mean in an asynchronous setting? It can be captured by require- 
ments on the clock values computed in different runs. In addition, the interference 
caused by Byzantine nodes in different runs needs to be captured. 

Informally, "agreement" requires the following from J-: given a configuration 
C, no matter what the adversary does, if different non-faulty nodes execute T 
they receive values that are close to each other. Definition 5, Definition 6 and 
Definition 7 formally state the "agreement" requirement. First, "no matter what 
the adversary does" is formally defined: 

Definition 5. An adversarial move from a configuration C is any configura- 
tion reachable by an arbitrary sequence of atomic steps of faulty nodes only. 

Second, "different non-faulty nodes execute J 7 " is divided into two cases. Let 
p, q be non- faulty nodes. The first case considers the computed value of p (when 
calculating T on C) as opposed to the computed value of q (see Definition 6). 
The second case considers the computed value of q after p has computed its value 
(see Definition 7) . Both cases require the computed values of p and q to be close 
to each other. 

Definition 6. A configuration C is £ -well-defined (with respect to some clock- 
function T) if there is a value v s.t. for any non- faulty node p and every adver- 
sarial move C from C it holds that v -<i J- p (C). v is called "a defined value" at 
C. (There may be more than one such v). 



Informally, Definition 6 says that C is ^-well-defined if there is an intrinsic 
value v such that any adversarial move cannot increase the clock-value by more 
than I. Thus, any two non-faulty nodes p, q (in different run extensions from C) 
that execute J- on C (no matter what the adversary has done) will receive values 
in the range {v, . . . , v + £}; i.e., p and q's values are at most £ apart. 

Suppose Co is ^-well-defined with value v, and that a non-faulty node p per- 
forms an atomic step at Co resulting in C\ and then a non-faulty node q performs 
an atomic step at C\. Definition 6 does not imply any constraint on the value of 
J~p(Co) with respect to J- q {C\), therefore the following definition is required: 

Definition 7. A run is i? -well-defined (w.r.t. a clock-function J-) if: a) every 
configuration C in the run is £- well- defined; b) for two consecutive configurations 
C,C , if v is a defined value of C and v' is a defined value of C then v -<g v' . 

Definition 7 states that the values of a clock-function T on consecutive con- 
figurations cannot be arbitrary. That is, they must be at most £ apart from 
the previous configuration. However, there is no requirement that they actually 
increase; i.e., "liveness" is not captured by the previous definitions. 

Definition 8. A run is ^-clock-synchronized (w.r.t. some clock-function J-), 
if it is £- well- defined (w.r.t. J-) and the defined values of consecutive configura- 
tions change infinitely many times. (I.e., for infinitely many consecutive config- 
urations C,C the defined values of C differ from the defined values ofC). 

Notice that Definition 7 already requires that defined values of consecutive 
configurations are non-decreasing (assuming that £ is sufficiently small with re- 
spect to k). Thus, combined with Definition 8, it implies that in an £-clock- 
synchronized run, infinitely many configurations are configurations with increas- 
ing defined values (informally, "increasing" means that one defined value is achieved 
by adding less than | to a previous defined value). 

Remark 1. Definition 7 and Definition 8 impose requirements on the defined val- 
ues of consecutive configurations. However, a specific node p might compute a 
clock value that is decreasing between consecutive configuration, i.e., p's clock 
might "go backward". For example, let C,C be two consecutive configurations, 
and let the defined value of both configurations be v. It is possible that p will 
compute the clock value to be v + 1 for C, while computing the clock value to be 
v for C. 

However, this possibility is immanent to an asynchronous Byzantine tolerant 
clock synchronization that has a wraparound value k. Consider a setting in which 
all nodes but one are advanced in a synchronous manner, while a single node p 
performs atomic steps only once every k — 1 rounds. In such a setting, p should 
update its clock value to be slightly below its previous value (alternatively, it can 
be seen as increasing the value by k — 1). 



Algorithm Async-Clock 



/* executed on node q */ 



01 : do forever: 



02: 

03: 



04 
05 
06 



07 
08 
09 
10 
11 
12 
13 
14 
15 



16 
17 
18 



/* read all registers */ 
for i :— 1 to n 

set vaU := read R Pi . q mod k; 

/* some internal definitions */ 

let #v denote the number of times v appears in {vali}™= 

let count(v, I) denote Ylj=o #(v ffi fc j)\ 

let pass(l,a) denote {v\count(v, I) > a}; 

/* update my-val */ 

if pass(0, n — f) j^ then 

set my_val := 1 ©^ max{pass(0, n 
else if pass(l,n — /) 7^ then 

set my_val„ := 1 ffifc max{pass(l,n 
else if pass(l,n — 2/) 7^ then 

let low £ pass(l, n — 2/) be such that low ffifc 1 £ pass(l, n — 2/); 

let relativejmedian = min{f|Z > & count(low , I) > ^}; 

set my_val„ := Zou> ffifc relative-median; 
else set my_vai := randomly select a value from pass(l, n — 3/) U{0}; 



/)}; 
/)}; 



/* write ray-val to registers 
for i := 1 to n 



for i := 1 to n 

write my-val into i?, 



V 



od; 



Fig. 1. A self-stabilizing Byzantine tolerant algorithm solving the 5-Clock- 
Synchronization problem. 



Definition 9. An algorithm A solves the ^-clock-synchronization problem 

if there is a clock-function T s.t. any fair run starting from any arbitrary config- 
uration has a suffix that is £- clock- synchronized with respect to T . 



An ideal protocol would solve the 0-clock-synchronization problem. However, 
due to the asynchronous nature of the discussed model, the best that can be 
expected is to solve the 1-clock-synchronization problem. We aim at solving the 
^-clock-synchronization problem for as many values of £ > 1 as possible. Clearly, 
if A solves the £i-clock-synchronization problem, then A also solves the ^2-clock- 
synchronization problem for any | > £2 > ^1- 

Therefore, the rest the paper concentrates on solving the 5-clock-synchronization 
problem; thus, solving the ^-clock-synchronization problem for all | > £ > 5 . In 
Section 7.1 we show how to use any — ^ — clock-synchronization problem to solve 
the 1-clock-synchronization problem, thus solving the ^-clock-synchronization 
problem for all | > £ > 1. 



4 Solving the 5-Clock-Synchronization Problem 

An atomic step consists of reading all registers, performing some calculations and 
writing to all registers. Thus, an atomic step consists of executing once an entire 
"loop" of Async-Clock (see Figure 1). 

Each non-faulty node p has a bounded integer variable, my_val„, which rep- 
resents the current clock value of p. When p performs an atomic step, it reads all 
of its registers, thus getting an impression of the clock values of the other nodes. 
It then computes its own new clock value (which is saved in my_val p ) and writes 
my_val p to all registers. 

ASYNC-CLOCK operates in a similar fashion to many other Byzantine tolerant 
algorithms. It first gathers information regarding the clock value of the other 
nodes in the system. Then it uses various thresholds to decide on the clock value 
for the next step. If no threshold works {i.e., no clear majority is found), it 
chooses a random value from a small set of options. 

To ensure all values read during Line 02-03 are in the range [0, . . . , k — 1] , 
the algorithm applies " mod k v to the values read. This is a standard way of 
dealing with uninitialized values. 

The crux of Async-Clock is in the exact thresholds and their application 
(Lines 07-15). In these lines, nodep considers different possibilities. Either it sees 
a decisive majority towards some clock value (Line 07 and Line 09) in which case 
p updates its local clock value to coincide with the majority clock value it has 
seen. Alternatively, if no clear majority exists (Line 15), p randomly selects a new 
clock value. The interesting case is when p sees a "partial" majority (Line 11), 
in which case p takes the relative median of the clock values it has seen. We call 
this a "relative median" since the clock values are " mod &" and thus the median 
in not well defined. 

The full Async-Clock algorithm appears in Figure 1. Async- Clock solves 
the ^-Clock-Synchronization problem for 1 = 5; combined with the discussion at 
the end of Section 3, it shows how to solve the £-Clock-Synchronization problem 
for any | > I > 5. 

Notation 41 The value of any variable var at configuration C r is denoted by 
C r {var). For a node q that does not perform the atomic step that changes C r to 
C r +\, the value of my_val„, denoted C r (my_val q ) , is the same before and after the 
atomic step. 

For node p that performs the atomic step at configuration C r , my_val is the 
only variable that is not deterministically determined by the values of the regis- 
ters at the beginning of p 's atomic step. In such a case, for my_val, the nota- 
tion C r (my_val p ) denotes the value of myjval before p starts its atomic step, and 
C r +i (my_ vaL) denotes the value of my_val after p finishes its atomic step. For 
all other variables, C r (var) will denote the value of variable var, as computed for 
configuration C r ; i.e., C r (pass„(l,n — 2/)) denotes the value that p computes for 
pass(l,n — 2/) during its atomic step at C r . 



5 Correctness Proof 

In the following discussion we consider the system only after all transient faults 
ended and each non-faulty node has taken at least one atomic step. We consider 
only runs of the system that begin after that initial sequence of atomic steps. 

Informally, a round is a portion of a run such that each node that is non- 
faulty throughout the round performs an atomic step at least once. The first 
round (of a run T) is the minimal prefix 1Z of the run T such that each node 
that is non-faulty throughout 1Z performs an atomic step at least once. Consider 
the suffix T 1 of T after the first round was removed. The second round of T is 
the first round of T'\ the definition continues so recursively. 

Consider any fair run of the system Cq — > C\ — > ■ ■ ■ — > C r — > ■ ■ ■ , and consider 
the transition from configuration C r to configuration C r +i, due to some (possibly 
faulty) node p's atomic step. Since we consider only runs after each non-faulty 
node q has taken at least one atomic step past the end of the transient faults 
events, the value of my_val q reflects the latest value written to all of g's write- 
registers. This property is true for all configurations that we consider. Thus, 
regarding a non-faulty p that performs an atomic step, for all non-faulty q it 
holds that R QtP = my_val q . 

The proof outline is as follows. First, define a tight configuration: 

Second, we show that if a configuration C r is tight then so is C r +i. Third, if C r 
is not tight, then we show that with probability ^ some configuration within 2 
rounds from C r will be tight. Concluding that after an expected 0(3 n ~ 2 -^) rounds 
the system reaches a tight configuration; and all following configurations are tight 
as well. At this stage, we need to show that the value v that a configuration is 
tight around continuously increases. 

To do so, we show that given that all configurations are tight, different non- 
faulty nodes that perform atomic steps can have values from a set containing (at 
most) 3 consecutive values. Moreover, for consecutive configurations, the minimal 
value among these 3 values can increase by at most 3. Lastly, by closely analyzing 
the behavior of Async-Clock, we conclude that within 4 rounds the minimal 
value above increases. That is, the clock function value changes, and changes 
again within at most 4 rounds, i.e., the clock value changes infinitely many times. 

The reason behind the increase of the aforementioned minimal value lies in the 
following claim: one of two things can happen, either the minimal value increases, 
or all the non-faulty nodes' clock values become at most 1 apart. In the second 
scenario, after one round, the minimal value will increase. Concluding that the 
clock value changes infinitely many times, as required. 

Remark 2. The en masse property is used in the proof that if C r is not tight, then 
with probability ^j 1 a configuration within 2 rounds from C r will be tight. Since 
in an en masse run some set of n — 2/ different non-faulty nodes are required to 
take atomic steps in a consecutive manner. Together with a claim stating that 



each such step has probability of 3 to flip a coin "in the right direction" , we get 
that with probability 2/ a tight configuration is reached. 

Lemma 1. If v\ -<d v' and v 2 -<d v' then either v 2 ^d v i or v \ ^d v 2- 

Proof. By definition, there are ji,J2 (0 < ji,J2 < d) such that v' = v\ ©j. j% and 
v' = v 2 ®k h- Thus, v 2 ©fc h = vi ©fc ji, which means that v 2 = «i ffi fc (ji - j 2 ). 
Clearly, | j\ — j 2 \ < d. If j\ — j 2 > then v 2 is at most j± — j 2 < d ahead of v± . 
Otherwise, j 2 — Ji > 0, meaning that v\ = v 2 ©^ (j 2 — j\). That is, v\ is at most 
J2 — ii < d ahead of v 2 ■ 

We are interested in the set of non-faulty nodes that are "close" to each other 
with respect to their value of my_val. 

Definition 10. H(C r ,v,d) is the set containing any non-faulty node q, such that 
C r (my_vaL) is at most d ahead of v. Formally, H(C r ,v,d) = {non-faulty q | v ^ 
C r (my_val g )}. 

Definition 11. H(C r ,p,v,d) is the set containing any node q, such that C r {R q ^ p ) 
is at most d ahead of v. Formally, H(C r ,p,v,d) = {q \ v ^ C r (R qtP )}. 

Notice that H(C r ,v,d) contains only non- faulty nodes, while H(C r ,p,v,d) 
may contain faulty nodes. The difference stems from H(C r ,p,v,d) representing 
what p "perceives" at configuration C r , as opposed to H(C r ,v,d) which says 
"what is true" in configuration C r . 

Lemma 2. \H(C r ,v,d)\ > \H(C r ,p,v,d)\ - f and \H(C r ,p,v,d)\ > \H(C r ,v,d)\. 

Remark 3. Notice that H(C r ,p, v, d) contains all the nodes (including faulty nodes) 
whose registers' value (in C r ) is at most d ahead of v. Thus, v G C r (pass p (d, x)) 7^ 
if (and only if) \H(C r ,p, v ,d)\ > x. 

Definition 12. A configuration C r is tight around value v if \H(C r ,v,l)\ > 
n — 2/; a configuration is tight if it is tight around some value. 

Lemma 3. // a configuration C r is tight around value v and around value v' ^ v, 
then either v ^1 v' , or v' ^1 v. 

Proof. By the lemma's assumption, it holds that 1^(^,^,1)1 > n — 2/ and 

\H(C r ,v', 1)| > n—2f. Since n > 6/, there is some non- faulty node q € H(C r ,v, 1) f] H(C r ,v', 1). 

Thus, C r (my_val„) is at most 1 ahead of v and at most 1 ahead of v' . The rest 

follows from Lemma 1. 

Remark 4- Following the same line of proof as in Lemma 3 shows that C r (pass„(l, re— 
2/)) can contain at most 2 values, and these values are consecutive values. 

Lemma 4. If in configuration C r , non-faulty node p performs Line 12 then 
C r (low p ) is well defined, for k > 3. 



Proof. By Remark 4, if p passes the condition of Line 11 then C r (pass p (l,n— 2/)) 
contains at most 2 values, which are consecutive. Thus, if k > 3 then C r (low p ) is 
well defined. 

Lemma 5. If p passes the condition in Line 11 and \H(C r ,v, 1)| > n — 3/ i/ien 
v ^i (low 0^ relative-median), for k > 4. 

Proof, p passed the condition in Line 11, thus C r (pass p (l,n — 2/)) ^ 0. Thus, 
for some v' it holds that \H(C r ,p,v', 1)| > n — 2/ (see Remark 3), and therefore 
|.ff (C r ,v', 1)| > n — 3/ (see Lemma 2). By the lemma's assumption, |fl"(C r ,u, 1)| > 
n — 3f. Since n > 6/, there is some non- faulty node g G H(C r , v', 1) P| H(C r , v,l). 
That is i>' ^i C r (my_val„) and w ^i C r (my_ vaLj. By Lemma 1 either v ^i u' or 

According to Line 12 and Remark 4, ?ou> ©& 1 ^i v' . Thus, in both scenarios 
(v ^i v' or v' -<\ v) it holds that low ^3 t>. Informally, low is "before" f, and 
relative -median (see Line 13) is increased until there are more than ^ nodes in 
the range [low, low ©& relative-median]. Since there are > n — 3/ > ^ copies of 
'V, relative-median will be such that low ffi& relative-median G {u,t> ffifc 1}- 

Formally, relative-median is the minimal value such that count p (low, relative-median) 
contains more than | nodes. Since \H(C r , v,l)\ > n — 3/ > §, at least one copy of 
'V is counted towards the sum of count p (low, relative-median) . Since k > 4 (by 
the lemma's assumption), copies off will not be counted in count p (low, relative-median') 
for relative-median' such that low (Bk relative-median' ^ {v,v (Bk 1}- On the 
other hand, count p (low, relative-median') > n — 2/ for relative-median' such 
that low (Bk relative-median' = v (Bk 1- Thus, relative-median (Bk low = v or 
relative-median (Bk low = v ©& 1. In both cases t> ^1 low ©j, relative-median. 

Lemma 6. // a configuration C r is tight then so is C r +i. 

Proof. If p is faulty, its update of my_vaL and/or its write-registers do not affect 
the "tightness" of the configuration C r +i. Thus, the rest of the proof assumes 
that p is non-faulty. 

First, notice that if C r (pass p (0, n — f)) ^ then C r (pass p (0,n — f)) contains 
a single value. This is because C r (pass p (0,n — /)) contains all the values v that 
appear at least n — f times in the registers read by p. If two values appear more 
than n — f times they must be the same value (since n > 6/). 

Consider p updating my-val p . If p updates it in Line 08, then it must have 
passed the "if" in Line 07. Thus, C r (pass p (0,n — /)) 7^ 0, which means that 
C r (pass p (0, n — /)) = {v}. Thus, my-val p is updated to v (Bk 1- From Remark 3 it 
holds that \H(C r ,p, v , 0)| > n — f, and from Lemma 2 it holds that \H(C r ,v , 0)| > 
n — 2/. Thus, after p's update of my_ val p to v (Bk 1 (and p's writing my-val p to all 
of p's write-registers), \H(C r+ \, v, 1)| >n — 2f holds. Thus, configuration C r+ \ is 
tight. 

If p updates my-val p in Line 10, then C r (pass p (l,n — f)) ^ 0. Denote by 
v := max{pass„(l,n — /)}. (in Line 10, my-val p is updated to v (Bk !)• Notice 



that v -<\ my_val p . In addition, v € C r (pass p (l,n — /)), thus (by Remark 3), 
\H(C r ,p, v, 1)| >n — f. Therefore, after p's update of my_val p , p £ H(C r+ i,v, 1), 
which means that |iJ(C r +i,u, 1)| > |if(C r , , u, 1)| (because p may not be counted 
for in H(C r ,v, 1)). By Lemma 2, |i?(C r ,f , 1)| > n — 2/, thus we have that 
|iT(C r -|_i,f, 1)| > n — 2f, i.e., configuration C r+ \ is tight. 

We are left to consider updates of my_vaL in Line 14 and Line 15. Notice that 
since C r is tight, \H(C r ,p,v, 1)| > n — 2f, for some u. Thus, C r (pass (l,n — 2/)) / 
0. Therefore, p passes the condition of Line 11, and p does not perform Line 15. 

According to the lemma's assumption C r is tight around some value v , and by 
Lemma 5 we have that v <\ relative-median ©^ low . That is, p updates my_val p 
such that p £ H(C r +i,v, 1). Since, \H(C r , v,l)\ > n — 2/ and since p is the only 
node changing its my_val p at C r , it holds that \H(C r+ i, v,l)\ > n — 2/. That is, 
C r +i is tight. 

The following lemmas assume all runs are en masse. In Section 6 this assump- 
tion is removed (at the cost of reducing the fault tolerance to n > 12/). 

Lemma 7. Consider an en masse run, and a non-faulty node p performing an 
atomic step at configuration C r . Denote by Si the set of non-faulty nodes that 
have performed an atomic step between C r and C r +i- Let m be the minimal m 
s.t. \S m \ = n — 2f, then each node q G S m performed an atomic step exactly once 
between C r and C r+rn . 

Proof. First, since we consider only fair runs, eventually Si will contain all non- 
faulty nodes. Thus, m, as defined in the lemma is well defined. Assume by way 
of contradiction that some node q in S m performed two atomic steps between 
C r and C r + m . In that case, there must be n — 2/ non- faulty nodes that perform 
atomic steps between g's two atomic steps. Thus, all of these nodes must be in 
S m , leading to the fact that |5 m _i| > n — 2/ which contradicts m being the 
minimal m s.t. \S m \ = n — 2/. Therefore, there is no such q £ S m , and all nodes 
in S m perform a single atomic step between C r and C r + m . 

Lemma 8. Let C ri denote the first configuration of some round 1Z, and C r2 de- 
note the last configuration of round TZ+ 1. With probability at least _ 2 y config- 
uration C r2 is tight. 

Proof. Consider non- faulty nodes performing atomic steps between C ri and C r2 . 
If some configuration C r ,r± < r < r2 is tight, then - by using Lemma 6 - every 
configuration after C r is tight. Thus, C r2 is tight. Therefore, our target is proving 
that with probability at least 3n l 2f some configuration C r is tight. 

Let C r , r\ < r < r2 be some configuration, and let p be a non- faulty node per- 
forming an atomic step on C r . p performs exactly one of the following: Line 08, 
Line 10, Line 14 or Line 15. If p performs Line 08 or Line 10, then C r (pass„(l, n — 
f)) ^ 0. That is, for some value v € C r (pass p (l, n— /)) it holds that \H(C r ,p, v,l)\ > 
n — f which means that \H(C r , v, 1)| > n — 2/ (see Lemma 2); that is, C r is tight 



around v. Therefore, if any non-faulty node performs Line 08 or Line 10 during 
rounds 72., 7^.+l, then C r2 is tight. 

The rest of the proof assumes no non-faulty node performs either Line 08 or 
Line 10 on any configuration C r ,ri < r < r%- Consider the first n — 2/ non- 
faulty nodes performing atomic steps in round 1Z. (By Lemma 7 these nodes 
perform exactly one atomic step, i.e., the adversary cannot reschedule a node if 
"it does not like" the outcome of that node's random coin). If they all perform 
only Line 15, then there is some probability that they all choose the same value 
of my_val, as they all choose from a set that contains "0". Each node chooses 
from a set pass(l,n — 3/)|J{0}, which contains at most 3 items. Thus, with 
probability at least 1^/ an non-faulty nodes choose the same value, leading to 
a tight configuration. 

The proof continues under the assumption that some non-faulty node p per- 
forms Line 14 on some configuration C r during round 1Z. Using the notations 
S m of Lemma 7, there are n — 2/ non-faulty nodes that perform atomic steps 
between C r and C r > = C r + m . Notice that since all non- faulty nodes perform an 
atomic step during round 1Z + 1 then configuration C r i is reached in round 1Z or 
in round 1Z+ 1, and in any case C r i is reached before C r2 . 

Since p performs Line 14 it passed the condition of Line 11 and it holds that 
C r (pass p (l,n — 2f)) ^ 0. By Lemma 2 and Remark 3 \H(C r ,v, 1)| > n — 3/ for 
some value v. According to Lemma 5, v -<\ C r+ i(my_val p ); thus, \H (C r+ i, v, 1)| > 
n-3f. 

The proof continues by showing that if \H(C r n,v , 1)| > n — 3/ for r",r + 
1 < r" < r', then with probability at least , it holds that \H(C r "+i,v, 1)| > 
n — 3/; and if node q performing an atomic step on C r » is non-faulty then also 
q G H(C r " + i, v, 1). Assume that \H(C r »,v,l)\ > n — 3/ and consider a node q 
performing an atomic step on C r n. If q is faulty then its action does not change 
the value of H(C r " + i,v, 1), thus \H(C r " + i, v,l)\ > n — 3/. If q is non-faulty and it 
performs Line 15 then since v € C r "(pass q (n — 3/, 1)), with probability at least ,, 
q selects v as its value of my_ vaL (as it is selected from a set containing at most 
three items). On the other hand, if q performs Line 14, then since \H(C r ",v, 1)| > 
n — 3/ by Lemma 5 5 6 H(C r » + i, v , 1) and \H(C r " + i,v , 1)| > n — 3/. Thus, 
in either case with probability at least o it holds that q G H(C r " + \,v, 1) and 
\H(Cr»+i,v,l)\ >n-3/. 

Therefore, if at some configuration C r it holds that \H(C r ,v, 1)| > n — 3/, 
then any non- faulty node q operating in a configuration C r », r" > r will have 
C r " + i(my_val q ) £ H(C r » + i, v , 1) (with probability > i). Therefore, once n — 
2/ non- faulty nodes perform an atomic step, they are all in H(C r >,v, 1) with 
probability at least 3n -2/ ■ Thus, C r > is tight with probability > 3n -$f ■ 



From this point on, the discussion assumes that all configurations are tight. 
Therefore, Line 15 will never be executed. 



Definition 13. Denote by V(C r ) the set containing any value v of a non-faulty 
node p, such thatp "helps" in the configuration C r being tight around v. Formally, 

V(Cr) = IJ {C r (my_val p )\p G H(C r ,v, 1)} . 

v s.t. \H(C r ,v,l)\>n-2f 

Lemma 9. If k > 6, then V(C r ) is exactly one of the following: {v}, {v,v (Bk 1} 

or 

{v, v ©& 1, v ©fc 2}, for some value v. 

Proof. From Lemma 3 it follows that |V(C r )| < 3. Moreover, if k > 6 then for 
any two values v,v' G V(C r ) it holds that v <2 v' or v' ^2 v, which is proved 
by way of contradiction. Assume that neither hold; notice that v G V(C r ) due 
to some value v such that jj^u and \H(C r ,v, 1)| > n — 2/; for similar reasons 
v' G V(C r ) due to v' <i v' . Thus, if neither v <2 v' or v' <2 v, we have that 
H(C r ,v,l)f]H(C r ,v',l) = 0, leading to \H(C r , v, 1) \J H(C r ,v', 1)| > 2(n-2/) = 
2n — 4/ > n. From the above discussion, if |V(C r )| = 3, it must be of the form 
V(C r ) = {t; ) u0fcl I t;efc2}. 

If |V(C r )| = 2, then V(C r ) can either be {v, uffifcl} or {t>,i;©fc2}. In the second 
option, no non-faulty node has a value of v (Bk 1> that is, H(C r ,v, 1) f] H(C r , v (Bk 
2,1) = 0. As before, we reach a contradiction from \H(C r , v, 1) |J H(C r ,v (Bk 
2, 1)| > 2(n - 2/) = 2ra - 4/ > n. 

Therefore, V(C r ) is exactly one of the following: {v}, {v,v (Bk 1} or {v,v (Bk 
l,u®fc2}. 

Lemma 9 leads to defining the "minimal" and "maximal" values of V(C r ) in 
the following way: V m in(C r ) := {v\v G V(C r ) Szv(Bk — 1 ^ V(C r )} and V m ax{C r ) ■= 
{v\v G V(C r ) &v ©^1 ^ V(C r )}. By the above lemma both V m i n (C r ) and V ma z(C r ) 
are well defined (for k > 6). 

Lemma 10. Le£ fc > 6 and /ei C r ,C r +i 6e too consecutive configurations. Then, 

Proof. Let p be the node that performs an atomic step between C r and C r +\. If p 
is faulty, then its update of my_ vaL does not affect the value of Vmin(C r +i), and 
we have that V min (C r ) = V m in(C r +i), which means that V m in(C r ) ^3 Vmin(C r +i)- 
The rest of the proof assumes p is non-faulty p updates my_val p due to 
Line 08, Line 10 or Line 14. If p performs Line 08 then my_val p is updated 
to v (Bk I, where v = max{pass p (0, n — /)}. By definition, v G V(C r ); thus, 

rmax(yr) — 1 >^max\yr-\-l)- 

Similarly, if p performs Line 10, then my_val p is updated to v (Bk 1> where 
v = max{pass p (l,n — /)}. Again, by definition v G V(C r ); thus, V m ax(C r ) Xj 

^rrara v^r +1 J ■ 

Consider p performs Line 14. By definition, \H(C r ,V m in(C r ),l)\ > n — 3/; 
thus, by Lemma 5 V m in{C r ) X_i ('ow ©^ relative-median) . Thus, V max .(C r ) Xi 

I'maa: l^r+1 ) ■ 



In all 3 scenarios it was shown that V ma x(C r ) zf!i V max (C r +i). However, 
|V(C r )| < 3 (see Lemma 9) and thus V m in(C r ) < 2 V max (C r ). Therefore, V m j n (C r ) ^ 3 
Vmax(C r+ i)- Since V m j„(C r+ i) < 2 Vmax{C r +i), we have that V min (C r ) ^ 3 V m in(Cr+i) 
as required. 

Lemma 11. If a non-faulty node p performs an atomic step between C r ,C r+ i 
then C r+ i(my_val p ) G V(C r +i). 

Proof. If p performs Line 08 or Line 10 then C r +i(my_val„) = v ffifc 1 for some 
v G C r (pass p (l,n — /)). By Remark 3 and Lemma 2, \H(C r ,v,l)\ > n — 2/. 
Therefore, p G H(C r +i,v,l) and \H(C r +i,v,l)\ > n — 2/, which means that 
C r+ i(my_val p ) G V(C r+ i). 

Consider p performing Line 14. Since C r is tight there is some value v such 
that |-ff(C r , u, 1)| > n — 2/. By Lemma 5 we have that v ^i C r+ i(izry_vaL). Thus, 
p G i?(C r +i, v, 1) and \H(C r +i,v, 1)| > n—2f, which means that C r +i(my_val p ) € 
V(C r+ i). 

Lemma 12. Starting from a tight configuration C , within 4 rounds there are two 
consecutive configurations C r ,C r+ i for which V m in(C r ) ¥" Kmn(Cr+i)- 

Proof. Consider configurations C = C ri ,C r2 ,C ra ,C r4 ,C r5 such that C ri+1 is one 
round after C n . Let C r /,r\ < r' < r§ be some configuration, and let p be a non- 
faulty node performing an atomic step on C r >. If V m j n (C r /) ^ V m j n (C r / + i), we are 
done. Otherwise, assume by way of contradiction that for all r\ < r' < r§ it holds 
that V m i n {C r i) = V m i n (C r / + i); denote V = V m j n (C ri ). Therefore, by Lemma 9 
and Lemma 11, C r i + i(my_val p ) G H(C r > + i, V, 2). Since all non-faulty nodes have 
performed an atomic step between C ri and C r2 , for any C r > ,r 2 < r' < r$ it holds 
that \H(C r ',V, 2)| = n — f . We continue to consider only configurations C r i such 
that r 2 < r' < r§. 

Notice that if p performs Line 08 or Line 10 then my_val p is updated to 
be at least "+1" from V. This is because only V, V (Bk 1,V ©& 2 may be in 
C r '(pasSp(l,n — /)) or C r /(pass p (0, n — /)). Thus, taking the maximum of these 
sets and adding "1" produces a value that is at least "+1" from V. 

We divide the proof into two scenarios: 1) for some configuration C r >,r 2 < 
r' < r 4 it holds that \H(C r ,, V, 0)| < § -/; 2) for all C r >, r 2 < r' < r 4 it holds that 
\H(C r i, V, 0)| > 5 — /. Consider the first case, ant let C r i be some configuration 
s.t. \H(C r >, V, 0)| < | — /. Clearly, if p performs Line 08 or Line 10 then it 
updates my_val p to be "greater" than V. If p performs Line 14, then because 
values that are not V ©& 1, V ©& 2 can appear at most S times, p must update 
my_val p to be "greater" than V. Therefore, if \H(C r /,V,0)\ < § — /, then also 
|ff(C r / + i, V, 0)| < § - /• Moreover, C r /+i (my_ vaLJ G {V ffi fc 1, V ffi fc 2}. 

Thus, if for some configuration C r ',r 2 < r' < r 4 it holds that |i7(C r /, V, 0)| < 
S — /, then starting from C r5 (at least one round after C r /), no non- faulty node 
has my_val equal to V, which means that V m j re (C r5 ) 7^ V = V m i n (C ri ). 



We continue under the assumption that \H(C r >, V, 0) | > §— /, for all r2 < r' < 
7*4. Recall that all non-faulty nodes have values from the set {V, V ©fc 1, V ©& 2}. 
Thus, \H(C r /,V ©fc 1,1)| < §• Therefore, if p passes Line 08 or Line 10 then 
C r /(pass p (0, n — /)) and C r /(pass p (l, n — /)) do not contain V ©fc 1, V ©fc 2; which 
means they may contain V ©fc —1 or V. Thus, p updates my_val p to V or V ©fc 1. 
On the other hand, if p performs Line 14 it updates my_val p to be either V or 
V ©fc 1 (recall that V m i n (C r i) = V which means that \H(C r r, V, 1)| > n — 2/ > 
|). Thus, in all cases, p updates my_val p to be either V or V ©fc 1. Therefore, 
\H(C r3 ,V,l)\ =n- f, and |iJ(C r ,,V,l)| = n - f for all r 3 < r' < r 4 . 

Thus, any non-faulty node performing an atomic step on configuration C r >, r$ < 
r' < r4, either passes the condition of Line 08 or Line 10. In both cases, it up- 
dates my_val p to be "greater" than V. Thus, starting from C T - 4 it holds that 
\H(C r >, V, 0)| < § — /■ And we are back to the previous case, in which we 
have shown that V m in must change within 1 round. Thus, for some configuration 
C r ',r4 < r' < r§ it holds that V m in(C r ') ¥" V- I n other words, within 4 rounds 
there is some configuration C r such that V m i n (C r ) ^ V m in(C r+ i). 

Following is the main result of the paper, which is shown to be true assuming 
that the runs are en masse. In the following section en masse runs are constructed 
from fair runs. Thus, the theorem can be updated to only require that the run is 
fair. 

Theorem 1. Async-Clock solves the 5 -clock- synchronization problem within 
expected 0(3 n ~ 2 ^) rounds, for any en masse run and wrap-around value greater 
than 6 (i.e., k > 6). 

Proof. Define the clock- function T executed by non-faulty node p at configuration 
C r to be the value of C T+ ± (my_ vaL) as updated by Async-Clock when executed 
as an atomic step. Combining Lemma 9, Lemma 10 and Lemma 11 shows that a 
tight configuration is 5- well-defined with respect to J-; where V m j n (C r ) a defined 
value at C r . In addition, these lemmas show that any fair run T consisting of 
only tight configurations is 5- well-defined. By Lemma 12, run T is also 5-clock- 
synchronized. 

Given any en masse run T' and any initial configuration Co, Lemma 8 states 
that with probability > l 2 y there is some configuration C r € T' (within two 
rounds from Co) that is tight. By Lemma 6 every configuration after C r is also 
tight. Thus, every fair run T' has a suffix T that consists of only tight configu- 
rations; and this suffix is reached within 0(3 n ~ 2 -^) rounds in expectation. From 
the above paragraph, T is 5-clock-synchronized. 

Thus, Async-Clock solves the 5-clock-synchronization problem. 

6 Ensuring En Masse Runs 

Our goal is to ensure that if a non-faulty node p performs a step, at least n — 2/ 
non-faulty nodes have performed a step since p's last step. That is, given an 



algorithm A we want to ensure that if some non-faulty node performs two steps 
of A then there are at least n — 2/ different non-faulty nodes that also perform 
steps of A. To ensure this, we present an algorithm EnMasse that ensures that 
a specific action, denoted "act" , is executed twice by the same non-faulty node p 
only if there are at least n — 4/ other non-faulty nodes that have also executed 
"act" . By setting "act" to execute an atomic step of A, we achieve the required 
goal. I.e., Async-Clock will be executed entirely every time "act" appears in 
EnMasse. 

As the algorithm we present ensures only n — 4/ nodes execute "act" in 
between two "acts" of every non-faulty node, we must reduce the Byzantine 
tolerance by half (n > 12/) to use EnMasse as a subcomponent of Async- 
Clock. That is, Async-Clock requires a threshold of ^n non-faulty nodes 
(n — 2/ threshold for n > 6/); EnMasse ensures a threshold of n — 4/. Therefore, 
by reducing the fault tolerance ton > 12 / we ensure that n— 4/ > ^n, as required 
by Async-Clock. 

Our solution borrows many ideas from [13]. Due to our model's atomicity 
assumptions, each node can read all registers and write to all registers in a single 
atomic step. Thus, the problems that [13] encounters do not exist in the current 
paper at all. However, in the current model there are additional faults (Byzantine 
and self-stabilizing) which do not exists in [13]. Interestingly, the same ideas used 
in [13] can be adapted to the self-stabilizing Byzantine tolerant setting. 

For each node p, there is a set of labels Labels p associated with p. In addition, 
each node p has a variable label p from the set Labels p ; Also, p has an ordering 
vector order p , of length \Labels p \, which induces an order on the labels in Labels p . 
Lastly, each node p has a time-stamp time p , which is a vector of n entries, 
consisting of a single label time p [q] £ Labels q for each node q. 

Definition 14. A label b is of type p if b € Label p . 

Definition 15. Two labels b,c of type p are compared according to order p , where 
b < p c ifb appears before c in the vector order p . The inequalities < p , > p ,> p ,= p 
are similarly defined. 

Definition 16. Given two time-stamps time p ,time q , and a set of nodes I, we 
say that time p >j time q ifp, q £ I and for every entry i £ I, time p [i] >i time q [i], 
time p [q] = q time q [q] and time p \p] > p time q [p\. 

To simplify notations, when it is clear from the context, we write p >i q 
instead of time p >j time q . That is, when comparing nodes (according to >/), we 
actually compare the nodes' time stamps. 

Definition 17. A set I of nodes is comparable if for any p,q € I either p >j q 
or q >j p. 

Lemma 13. If I is a comparable set, and p,q,w € /, and p >j q, q >j w then 
p >i w. 



Proof. Since I is comparable, either p >j w or w >/ p. Suppose by way of 
contradiction that w >/ p, thus time p [w] < w time w [w\. However, since p >/ q 
we have that time p [w] > w time q [w], and since q >j w we have that time q [w] = w 
time w [w]. Thus, time p [w] > w time w [w], contradicting time p [w] < w time w [w]. 
Therefore, it is not true that w >i p, leaving only one other option: p >j w. 

Lemma 14. Let I, I' be comparable sets, then I P\ I' is a comparable set. More- 
over, forp,q£ln V , p < InI > q iff p <i q. 

Proof. Let /, I' be comparable sets, and let p, q G In I'. Since p,q G I and / 
is comparable, either p <j q or q <j p, similarly, either p <// q or q <p p. 
Suppose by way of contradiction that p <i q and q <j> p. Due to p <j q it holds 
that time g [q] > q time p [q] and due to q <p p it holds that time q [q] = q time p [q\; 
leading to a contradiction. Thus, either p <j q and p <// q or p >j q and p >ji q. 

Assume that p <i q and p <// q. Therefore, for all i G I U I' it holds that 
time p [i] <$ time q [i], time p \p] = time q \p] and time p [q] < time q [q\. Thus, for all 
i G I n /' it holds that time p [i] <$ time q [i], and by definition we have that 
P </n/' 9- Similarly, if p >j q and p >// g then p >/n/' 9- 

It was shown that only two options exist: 1) p <i q and p <// g, 2) p >j q and 
p >// g. If option 1 occurs, thenp </n/' ?! if option 2 occurs thenp >/n/' 9- Thus, 
any p,q E I Cl I' either p >/ n /' q or p <mi> q holds, i.e., I fl I' is comparable. 
Moreover, we have shown that p <mr q iff p <i q, as required. 

Remark 5. Notice that proof of the lemma above also implies that p <j q iff 
p <// g. 

Notice that a comparable set / induces a total order among the elements in 
/, therefore we can refer to the index of an element in I. 

Definition 18. A node p G I is said to be the kth highest (in I) if\{q G I\q >i p}\ 
k — 1. Let I#(p) = k if p G I is the kth highest in I. 

The 1st highest in / is the node that is larger than all other nodes. The 2nd 
highest node in / is the node that has only one node larger than it; (and so on). 

Informally, we wish to show that given two intersecting comparable sets /, I' , 
if a node p is the ith highest item in /, it is at most i + £ highest in /'; where £ 
changes according to 1,1'. The following lemma formally bounds the difference 
between I#(p) and I r 4±(p). 

Lemma 15. Let I, I' be comparable sets, and denote £ = \I\ — \InI'\. If p G IT\V 
then I # (p) <I' # (p)+£. 

Proof. Let p G I l~l V . By definition I#(p) = \{q G I\q >i p}\ + 1 and I'#(p) = 
\{q G I'\q >j' p}\ + 1. Therefore, it is enough to show that \{q G I\q >i p}\ < 
\{q G I'\q >j/ p}\ + £. Consider the set A = {q G IC\I'\q >mr p}, be Lemma 14, 
it holds that A <Z {q e I\q >/ p} and A C {q £ I'\q >p p}. Clearly, {q G I\q >/ 



Algorithm EnMasse /* executed on node q */ 

01 : do forever: 



02 
03 
04 
05 
06 



07 
08 
09 
10 
11 



/* read all registers and initialize structures */ 

for each node p, read time p and order p ; 

set 1 :- 0; 

for each set W C V s.t. \W\ > n - f and q € W: 

construct / := {time p | p £ VK}; 

if VK is comparable then X := I U {/}; 

/* decide whether to execute "update" and whether to execute "act" */ 
if for some / £ I, it holds that /# (q) > n — 3/ then 

update time q , or der q and "act"; 
if X = then update time q , order q ; 
write time q and order q ; 



od: 



Updating time q is done by setting time q [p] = label p , for every peP. 

Updating order q consists of changing the order induced by order q such that label q is 

first and for other labels the order is preserved. 

Fig. 2. A self-stabilizing Byzantine tolerant algorithm ensuring en masse runs. 



p} — A contains only items in I — In I'. Thus, \{q € I\q >i p} — A\ < \I\ — \InI'\ 
and since A C {q G J|q >/ p} it holds that \{q € I|g >/p}| - \A\ < \I\ - |/n/'|. 
That is, \{q G I\q >i p}\ < \A\ + L Since, A Q {q € I'\q >// p} it holds that 
\A\ < \{q G I'\q > r p}\. Thus, |{g G I\q >/ p}\ < \{q G /'|g >// p}| + £, as 
required. 

Corollary 1. Let 1,1' be comparable sets, and denoted.' = max{|I|, |/'|} — |/n/'|. 

IfpEinP then I' # (p) -£' < / # (p) < I' # {p) + £'. 

6.1 Algorithm EnMasse 

This section proves general properties of comparable sets. It discusses "static" 
sets, that do not change over time. The following algorithm considers comparable 
sets that change from step to step. However, during each atomic step, the com- 
parable sets that are considered do not change, and the claims from the previous 
section hold. That is, when reasoning about the progress of the algorithm, the 
comparable sets that are considered are all "static". 

In the following algorithm, instead of storing both labelp and time p , each 
node stores just time p and the value of label p is the entry time p \p\. In addition, 
during each atomic step, the entire algorithm is executed, i.e., a node reads all 
time stamps and all order vectors of other nodes, and can update its own time 
stamp during an atomic step. 

When a node q performs an update, it changes the value of time q and order q 
in the following way: a) order „ is updated such that time q [q] is larger than any 



other label in Labels q . b) time q \p] is set to be time p \p], for all p. Notice that the 
new order q does not affect the relative order of labels in Labels q that are not 
time q [q]. That is, if h,l<2 ^ time q [q] and l\ < q I2 before the change of order q , it 
holds that l± < q I2 also after updating the order q . 

Intuitively, the idea of EnMasse is to increase the time stamp of a node q 
only if q sees that most of the other nodes are ahead of q. When the time stamp 
is increased, q also performs "act". This leads to the following dynamics: a) If 
q has performed an "act" twice, i.e., updated its time stamp twice, then after 
the first update, q is ahead of all other nodes, b) However, since q is ahead of 
all non-faulty nodes, if q updates its time stamp again it must mean that many 
nodes have updated their time stamps after q's first update, i.e., between two 
"act" of q many other nodes have performed "act" as well. 

We continue with an overview of the proof. First, consider the set of non- faulty 
nodes, and consider the set of time stamps of these nodes. The proof shows that 
if this set is comparable for some configuration C r then it is comparable for any 
configuration C r > where r' > r. Second, we consider an arbitrary starting state, 
and consider the set Y r containing non-faulty nodes that have updated their time 
stamp by the end of round r. It is shown that if \Y r \ > 11 — 2 f then |K r+ ij > n — f. 
Moreover, if \Y r \ < n — 2/ then |5^-+i| > \Y r \ + 1- Thus, we conclude that within 
O(n) rounds all non- faulty nodes have performed an update. 

Once all non-faulty nodes have performed an update since the starting state, 
it holds that the set of all non-faulty nodes' time stamps is comparable. Thus, 
during every round at least 2/ nodes perform an update (as they see themselves 
in the lower 3/ part of the comparable set). This ensures that within jt rounds 
some node will perform "act" twice. That is, there is no deadlock in the EnMasse 
algorithm. To conclude the proof, it is shown that when the set of all non-faulty 
nodes values is comparable and some non-faulty node performs "act" twice, it 
must be that another n — 4/ non-faulty nodes have performed "act" in between. 

Definition 19. Let Z be a set of non- faulty nodes, and consider an atomic step 
on configuration C r . Denote by TSc r {Z) the set of time-stamps of nodes in Z, 
as they are at the beginning of the atomic step. When the configuration C r is 
clear from the context, we simply say "Z is comparable", instead of "TSc r {Z) is 
comparable" . 

Lemma 16. Let Z be a set of non-faulty nodes and consider any atomic step on 
configuration C r . IfTSc r (Z) is comparable, then TSc ,(Z) is comparable for any 
r > r. 

Proof. First, notice that whether TSc , is comparable or not depends only on 
the values of nodes in Z and is not affected by nodes not in Z . Therefore, only 
changes incurred by nodes in Z matter. Consider the first node p € Z to perform 
an update at some configuration C r », r" > r. Thus, p sets time p [q] = time q [q] 



for all nodes q E Z and also p ensures that time p [p] > p time q \p] for all nodes 
q E Z,q 7^ p. Thus, p >z q for all nodes q E Z,q 7^ p. 

Consider two nodes (71,(72 7^ J> in Z. p's update does not change the value 
of time qi [q] and time q2 [q] for all (7 £ Z, (7 7^ p. What about the relative order 
of time qi \p] and time q2 [p\? W.l.o.g. q\ <z (72 before p's update. According to 
the way p changes order p , after p's update time qi \p] < p time q2 [p\. Therefore, 

9i <z Q2- 

Thus, for any pair of nodes (71,(72 E Z either q\ <z q2 or q\ <z (72 after p's 
update. Repeating the above line of proof for any node in Z inductively proves 
that TSq 1 (Z) is comparable. 

Consider the system starts in an arbitrary state. Denote by U r the set of 
non-faulty nodes that have not performed any "update" by the end of round r, 
and by Y r the set of non-faulty nodes that have performed "update" by the end 
of round r. 

Lemma 17. The set Y r is comparable during the last configuration of round r. 

Proof. Consider the order at which non-faulty nodes performed updates by the 
end of round r: let p\ denote the first node to perform an update, P2 the second, 
...,p m be the mth (and last) non- faulty node to perform an update. A node 
may appear more than once in that order, for example, if some node was the 
2nd and 5th to perform an update, P2 = Ps- Denote by Ai the set containing all 
non-faulty nodes in {p\, . . . ,pi}, where pi is the ith node performing an update. 
Clearly, A m = Y r , and it is left to show that the set of time-stamps of nodes from 
A m is comparable. Notice that if Pi+\ E Ai then Ai = A{ + ±, that is, if the node 
performing the "next" update has already performed an update, Ai = -Aj+i. 

We show something stronger: for all < i < m let C, be the configuration 
after pi performs an atomic step, then TSci(Ai) is comparable. The proof is by 
induction on i. For i = 0, 1, clearly Ai is comparable. We are left to show that if 
Ai is comparable so is Ai + \. 

If pi + \ € Ai {i.e., pi + \ already performed an update) then by Lemma 16 it 
holds that Ai + \ is comparable. Consider pi + \ such that pi + \ ^ Ai. According to 
the way pi+\ updates its registers, for any two nodes q±,q2, if qi <Ai Q2 then also 
Qi <A l+1 12- Moreover, for any node q € Ai it holds that q <A t Pi+i- Thus, for 
any pair of nodes p,q £ Ai + \ either p <A i+1 Q or q <A i+1 P- 

Thus, Ai + \ is comparable. To complete the proof, recall that A m = Y r . 

Lemma 18. Let q be a node in U r . Whenever q performs an atomic step before 
the end of round r, it holds that X 7^ 0, and for all I € X it holds that I#(q) < 
n-3f. 

Proof. If Z = during g's atomic step, then q will perform an updated, and 
q £ U r . Similarly, if for some I El it holds that I#(q) >n — 3f then q will also 
perform an update. Since q E U r , by definition, q does not perform an update 
until the end of round r. 



Lemma 19. Let q be a non-faulty node and consider q 's atomic step during 
round r",r < r" < r' . For any comparable set I 62 that q considers in Line 07, 
and for any p' G U r >,p" G Y r : if p',p" G I then I#(p") < I#(p'). 

Proof. Since p',p" are both in I, either pi <j p" or p" </ p'. Since p' G U r i it has 
not yet performed an update, while p" has performed an update before the end 
of round r. Thus, time p //\p'] = time p /\p'] and therefore it cannot be that p" <i p', 
leaving us with p' </ p" . Since I is totaly ordered, any node w such that p" <j w 
also holds that p' <i w. Therefore, there are more nodes in / that are larger than 
p' than there are nodes that are larger than p" . i.e., I#(p") < I#(p'). 

Lemma 20. If \Y r \ > n - 2/ then \Y r+1 \ >n-f. 

Proof. If C/r+i = we are done. Otherwise, assume by way of contradiction that 
q G U r +i. By Lemma 18 during g's atomic steps in round r + 1 it holds that I ^ 
and for all / G X we have that I#(q) < n — 3f. Consider such an / G X; since 
\Y r \ > n — 2/, it holds that \I fl Y r \ > n — 3/. That is, I contains at least n — 3f 
nodes that have performed an update. Thus, by Lemma 19, I#(q) > n — 3/ which 
contradicts the fact that I#(q) < n — 3/. Thus, q £ U r+ \ and U r+ \ = 0. 

Lemma 21. If \Y r \ < n - 2/ then \Y r+1 \ > \Y r \ + 1. 

Proof. Assume by way of contradiction that |lr+i| < [3^- [ + 1. Therefore, |lr+i| < 
\Y r \ < n — 2/ and |f7 r +i| > / + 1- Thus, for any set / containing at least n — f 
nodes, it holds that I D U r+ \ ^ 0. 

Let q G ?7f+i, by Lemma 18 during q's atomic steps in round r+1 it holds that 
I ^ and for all I G T we have that I#(q) < n — 3/. By Lemma 19, for any node 
q' G U r+ i and any node p' G Y r it holds that if q',p' G I then I#(p') < I#(q'). 
Therefore, for any node p G I fl l^+i it holds that I#(p) > \I C\Y r \. As there 
are |I fl L^+il > nodes from U r+ \ in /, there is a node p G I C\ U r+ \ such that 
/#(p) > |^nF r | + \IC\U r+ i\. 

Notice that Y r C l^+i, and since |lr+i| < |lr| it holds that Y r = Y r+ \. 
Moreover, U r+ i D Y r+1 = and \U r+ i U Y r+1 \ = n - f. Thus, \I D Y r \ + \I D 
U r+ i\ = 1 1 fl (i> U £/ r+ i)| > n — 2/. That is, there is a node p G J fl £/ r+ i 
such that I#(p) > n — 2f. i.e., there are at most 3/ — 1 nodes p' that have the 
following property: time p >\p'] < p > time p \p']. Since p does not perform an update, 
this property can change only if p' performs an update, which will reduce the 
number of nodes such that time p >\p'] < p > time p \p'\. Therefore, throughout round 
r + 1, if p considers a comparable set I G X, it will always have /#(p) > n — 3/. 

Consider an atomic step by p during round r+1, by Lemma 18 for any I 1 G 2" 
that p considers in Line 07, Ijj.{p) < n — 3/, which contradicts the above fact 
that I#(p) > n — 3f. Thus, we conclude that |3^-+i| > \Y r \; as required. 

Corollary 2. For any round r > n — 2f+2, it holds that U r = and \Y r \ = n — f. 



Proof. Apply Lemma 21 during the first n — 2/ rounds, then apply Lemma 20 
for round n — 2/ + 1. 

Lemma 22. For any round r > n — 2/ + 2, any non-faulty node q considers 
Y r G X during atomic steps of round r . 

Proof. By Corollary 2, |y r _i| = n — f and ?7 r _i = 0, leading to q G 5^._i- By 
Lemma 17 and Lemma 16 Y r is comparable, and is viewed as comparable by q 
during any atomic step of round r. Thus, when q performs an atomic step during 
round r, q adds Y r to X at Line 06. 

Corollary 3. For any round r > n — 2/ + 2, no non-faulty node passes the 
condition of Line 09. 

Proof. By Lemma 22, a non- faulty node g performing an atomic step during 
round r > n — 2/ + 2 has X 7^ 0. Thus, the condition of Line 09 does not hold. 

Lemma 23. Let r be any round, r > n — 2/ + 2. During round r at least 2/ 
non-faulty nodes perform update. 

Proof. Let r > n — 2/ + 2 be any round. Consider the set W = {time q \q G Y r } 
before the first atomic step of round r. Denote by Z = {p G W|W#(p) > n — 3/}, 
that is, Z contains all nodes that are at most n — 3/ highest in W. Since \W\ = 
n — f it holds that |Z| = 2/. 

For each node g G Z, consider g's first atomic step in round r. By the proof 
of Lemma 17, since g did not perform an update since the beginning of round 
r, when it performs its first atomic step, it holds that W#(q) > n — 3/ and q 
will pass the condition in Line 07 and perform an update (and "act") in Line 08. 
This holds for all q G Z, that is, for at least 2/ nodes. 

Lemma 24. Starting from round n — 2f + 3, every non-faulty node p performs 
an update at least once every jt rounds. 

Proof. By the proof of Lemma 23, every round the lowest 2/ nodes perform an 
update. Thus, if p does not perform an update during round r, there are at least 
2/ nodes higher than it. Consider round r + i, if p does not perform an update 
there are 2/ • i nodes higher than p. Therefore, after at most ^t rounds p will 
perform an update. 

Lemma 25. Consider a non-faulty node p performing an update twice, then 
there are at least n — 4/ other nodes that have performed update in between. 

Proof. Consider the comparable set Y r after p's first update. By the way p does an 
update, Y r #(p) = 1. When p performs its second update, it has some comparable 
set / G X, such that I#(p) >n — 3/. Therefore, at least n — 4/ non- faulty nodes 
have become larger than p. Thus, they all must have performed an update. 



Theorem 2. Starting from round n — 2/ + 3 ; between any non-faulty node's 
two consecutive "act"s, there are n — 4/ non- faulty nodes that perform "act". 
Moreover, every non-faulty node performs an "act" at least once every jj rounds. 

Proof. By Corollary 3, non-faulty nodes perform update only if they also perform 
an "act". By Lemma 25, between a non-faulty node's two consecutive updates 
there are n — 4/ non-faulty nodes that perform an update. By Lemma 24 every 
non-faulty node performs an update at least once every jt rounds. Combining 
these two claims yields the required result. 

Theorem 2 states that using EnMasse one can ensure that nodes executing 
Async-Clock will have the following properties: 1) every non-faulty node p 
executes an atomic step of Async-Clock once every ^ rounds; 2) if non-faulty 
p executes 2 atomic steps of Async-Clock, then at least n — 4/ non-faulty 
nodes execute atomic steps of Async-Clock in between. By setting n > 12/, 
these properties ensure that a fair run T is an en-masse run T' with respect to 
Async-Clock, s.t. each round of T' consists of at most jj rounds of T. 

7 Discussion 

7.1 Solving the 1-Clock-Synchronization Problem 

First, the 5-clock-synchronization problem was solved using Async-Clock while 
assuming en masse runs. Second, the assumption of en masse runs was removed 
in Section 6. In this subsection we complete the paper's result by showing how 
to transform a 5-clock-synchronization algorithm to a 1-clock-synchronization 
algorithm. 

Given any algorithm A that solves the £-clock-synchronization problem, one 
can construct an algorithm A' that solves the 1-clock-synchronization problem. 
Denote by kj\> the desired wraparound value of A', and let kjy = kj\> ■ £ be the 
wraparound value for A. 

The construction is simple: each time A' is executed, it runs A and returns 
the clock value of A divided by I (that is, L - ^])- The intuition behind this 
construction is straightforward: A solves the ^-clock-synchronization problem, 
thus, the values it returns are at most i apart. Therefore, the values that A' 
returns are at most 1 apart from each other. 

7.2 Future Work 

The current paper has a few drawbacks, each of which is interesting to resolve. 

First, is it possible to reduce the atomicity requirements; that is, can an 
atomic step be defined as a single read or a single write (and not as "read all 
registers and write all registers")? 



Second, can the current algorithm be transported into a message passing 
model? 

Third, can different coin-flipping algorithms that operate in the asynchronous 
setting (i.e., [6]) be used to reduce the exponential convergence time to something 
more reasonable? Perhaps even expected constant time? 

Fourth, can the ratio between Byzantine and non- Byzantine nodes be re- 
duced? I.e., can n > 3/ be achieved? 

Fifth, can the problem of asynchronous Byzantine agreement be reduced to 
the problem of clock synchronization presented in the current work? (This will 
show that the expected exponential convergence time is as good as is currently 
known). 

Lastly, the building block EnMasse is interesting by itself. It would be in- 
teresting to find a polynomial solution to EnMasse. 
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